Bioinformatics Advances — Latest Matching Preprints

1

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 2%

3.3%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

2

Benchmarking Speech Recognition Models for Medical Consultations in Latin American Spanish: A Comparative Evaluation with Fine-Tuning

Carrillo, R. M.; Carbajal Serrano, A.; Condori Pinedo, P. S.

2026-07-16 public and global health 10.64898/2026.07.14.26358062 medRxiv

Top 4%

1.2%

Show abstract

BACKGROUND: Artificial intelligence (AI) medical scribes rely on speech-to-text (STT) models for transcription. Evaluations of STT models in non-English settings remain scarce. We benchmarked ten STT models on medical consultations from Latin American (LatAm) Spanish and assessed whether fine-tuning improves transcription accuracy. METHODS: Ten YouTube videos depicting medical consultations. Human transcriptions were the ground truth. Five open-source models were evaluated: Whisper Large, Whisper Large v3, Whisper Large v3 Turbo, Voxtral Mini 3B, and Canary 1B v2; and so were five close-source models: gpt-4o-transcribe, gpt-4o-mini-transcribe, gemini-2.5-pro, Eleven Labs, and Assembly AI. Whisper Large v3 was fine-tuned. One video was withheld from training. Performance assessed using Word Error Rate (WER), Character Error Rate (CER), BLEU Score, ROUGE-L, BERT Score, and Semantic Similarity on the one withheld video. RESULTS: None of the fine-tuning iterations outperformed the vanilla Whisper Large v3. With the withheld video, Gemini-2.5-pro was the close-source model with the best performance in four of six metrics. In comparison to the close-source models, the fine-tuned model never outperformed the other models (withheld video); conversely, in comparison to the close-source models, the fine-tuned model showed better performance across metrics, for instance: BLEU score (63% vs to 58% for the second-ranking model), BERT (89% vs to 86%), and semantic similarity (89% vs to 83%), CER (19% vs 20%). CONCLUSIONS: Whisper Large v3 and its fine-tuned variant are the best open-source STT models for transcribing medical conversations in LatAm Spanish. These findings provide an evidence base for developing AI medical scribes tailored to Spanish-speaking LatAm.

3

Multi-Agent Dynamic Refinement Outperforms Static RAG in Clinical Reasoning for Complex Nephrology Cases

Yano, Y.; Kakizaki, H.; Nagasu, H.; Kishi, S.; Koshida, T.; Nihei, Y.; Hirano, A.; Sugawara, Y.; Imaizumi, T.; Osakabe, Y.; Sakaguchi, Y.; Nangaku, M.; Mori, H.; Naito, T.; Ohashi, M.; Maruyama, S.; Matsui, I.; Isaka, Y.; Okada, H.; Suzuki, Y.; Kashihara, N.

2026-07-16 nephrology 10.64898/2026.07.15.26358121 medRxiv

Top 4%

1.1%

Show abstract

Background: Large language models (LLMs) struggle with dynamic, longitudinal clinical reasoning. We developed a Multi-Stage Iterative Clinical Reasoning Agent framework to address this gap and systematically decouple the clinical efficacy of static retrieval-augmented generation (RAG) from dynamic self-refinement. Methods: Ten complex longitudinal nephrology cases, rigorously selected via a modified Delphi consensus technique, were blindly evaluated by four board-certified nephrologists and a multi-model AI panel. We compared three architectures across nine cognitive steps: (Model A) a baseline frontier LLM, (Model B) an LLM augmented with static guideline-based RAG, and (Model C) our proposed multi-agent framework featuring RAG integrated with iterative self-critique and refinement. Results: In human evaluations (20-point scale), Model C (mean 17.2, SD 1.2) significantly outperformed both Model A (16.1, 1.3) and Model B (16.2, 1.2) (P < 0.001). Implementing static RAG (Model B) yielded no significant improvement over the baseline. Automated AI evaluations (15-point scale) corroborated these findings: Model C (14.7, 0.6) outscored Model A (14.2, 0.9, P < 0.001) and Model B (14.3, 0.9, P = 0.01). While monolithic models exhibited severe score degradations in planning-heavy tasks such as dynamic differential diagnoses, the multi-agent framework effectively intercepted error cascades, achieving significantly higher diagnostic accuracy (mean 17.6, P = 0.019) and therapeutic management scores (17.3, P = 0.002). Conclusions: Static knowledge retrieval alone fails to enhance frontier LLM performance in longitudinal medical reasoning. Distributing clinical workflows into a multi-agent dynamic refinement pipeline significantly improves reasoning completeness, intercepts error cascades, and safely resolves planning bottlenecks in complex patient care.

4

ReCo: a self-configuring and self-extending agentic framework for biomedical research

Tzanis, E.; Klontzas, M. E.

2026-07-16 health informatics 10.64898/2026.07.14.26358025 medRxiv

Top 4%

0.8%

Show abstract

This study presents ReCo (Research Cosmos), a self-configuring and self-extending agentic research framework for the biomedical domain. ReCo is orchestrated by a large language model that interacts with native computing tools, bundled Model Context Protocol (MCP) servers, structured skills, persistent project memory, and a desktop interface. Its bundled MCP servers provide biomedical analysis capabilities while serving as implementation paradigms for integrating new computational and AI frameworks. Structured skills encode procedures for environment configuration and framework ingestion, enabling ReCo to inspect repositories, manuscripts, or local codebases; identify dependencies and execution patterns; create isolated runtime environments; design and implement MCP interfaces. Self-extension was evaluated using five heterogeneous systems: the Merlin computed tomography foundation model, MAISI-v2 medical image synthesis framework, asari liquid chromatography-mass spectrometry workflow, DosimeTron agentic radiation-dosimetry platform, and Orthanc DICOM server. ReCo successfully operationalized all five systems and completed predefined functional evaluations. Re-hosted DosimeTron outputs demonstrated near-perfect agreement with the reference pipeline across 651 organ observations (Pearson correlation and Lin concordance correlation coefficient, 0.99999; mean absolute percentage difference, 0.37%). Notably, ReCo configured Orthanc as a PACS-like coordination layer, integrated it with DosimeTron, Merlin, and TotalSegmentator, and orchestrated data retrieval, analysis, and return of valid DICOM RTSTRUCT, RTDOSE, and Structured Report. ReCo provides a unified environment for configuring, documenting, and operationalizing heterogeneous biomedical frameworks, reducing technical barriers to the adoption and integration of emerging computational and AI methods. The official open-source ReCo GitHub repository is available at: https://github.com/eltzanis/ReCo

5

CuGen: A GPU-accelerated framework for large-scale genomics

Kiiskinen, T.; Richland, J.; Wang, W.; Lu, W. S.; Balasubramanian, N.; Hastie, T.; Tibshirani, R.; Rivas, M. A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358178 medRxiv

Top 4%

0.8%

Show abstract

Biobank-scale genomic analyses remain computationally expensive, CPU-bound workflows, particularly when adjusting for confounding. Here, we present CuGen, a GPU-accelerated framework for large-scale genomics. CuGen uses UltraLasso, a novel hierarchical application of univariate-guided sparse regression (uniLasso), to select a compact, phenotype-informed active set of fewer than 30,000 variants. This achieves robust leave-one-chromosome-out (LOCO) confounding control, enabling both downstream GWAS and in-sample fine-mapping. Additionally, we introduce the .cugen file format, a genotype representation designed for memory-optimized, high-throughput streaming and random access on GPU hardware. Building on this substrate, we provide a general GPU-accelerated genomics toolkit handling polygenic prediction, data manipulation, quality control, analysis, and visualization. We demonstrate CuGen's efficacy in the UK Biobank with up to 408,624 individuals, where the full GWAS pipeline and fine-mapping against 6.8 million imputed variants completes in approximately 10 minutes on a single high-throughput GPU with 80 GB of memory. The pipeline scales efficiently to massive phenome-wide analyses with sublinear resource consumption.

6

Machine learning models to improve targeting of blood culture testing

Forrest-Hammond, R. W.; Gupta, R.; McVean, G.; Noursadeghi, M.; O'Grady, J.; Samuels, T. H.; Eyre, D. W.

2026-07-20 infectious diseases 10.64898/2026.07.17.26358320 medRxiv

Top 5%

0.5%

Show abstract

Background Bloodstream infections are a major cause of mortality, yet the primary testing method, blood cultures, have low positivity (<10%) and turnaround times of 24 - 48 hours. Many are taken from patients at low risk of infection, while some bloodstream infections are diagnosed late or missed entirely. We aimed to develop and externally validate machine learning models to improve targeting of blood culture testing. Methods In this retrospective cohort study, we used routinely collected clinical and laboratory data available around culture collection from a large multi-site NHS trust (Oxford University Hospitals; Infections in Oxfordshire Research Database), between 1 January 2016 and 17 March 2025. All blood cultures taken from adults and children were included. XGBoost models were trained to predict pathogenic blood culture positivity using a temporal split (training before 1 January 2024; held-out test thereafter). External validation used emergency department data (between 1st May 2019 and 30th April 2024) from University College London Hospitals. An additional analysis examined blood culture reallocation towards the highest-risk untested admissions. Findings 294,064 cultures were included (positivity 5.6%). In the temporal hold-out test set (n=46,339), AUROC (Area Under the Receiver Operating Characteristic) was 0.853 (95% CI 0.846 - 0.860), rising to 0.876 in emergency department patients, and the model was well calibrated (slope 1.046). In external validation (n=37,326), AUROC was 0.847 (95% CI 0.839 - 0.856) with preserved calibration. In a simulated resource-neutral reallocation, replacing the 10,000 lowest-risk sent cultures with the highest-risk untested emergency admissions yielded 627 additional positive cultures (28.3% relative increase in yield). Performance was reduced when restricted to data available at the point of culture collection (AUROC 0.769, 95% CI 0.760 - 0.779). Interpretation An externally validated, well calibrated machine learning model built from broadly available, routinely collected data could improve blood culture yield without increasing testing volume, supporting resource-neutral diagnostic stewardship across NHS sites.

7

FoodScribe: an open-source semantic framework for nutrient estimation from free-text dietary records

Gouda, H.; Sala Climent, M.; Agongo, J.; Gaikwad, S. P.; Nattakom, A.; Zhao, H. N.; Xing, S.; Boland, B. S.; Holt, T.; Guma, M.; Dorrestein, P. C.

2026-07-17 nutrition 10.64898/2026.07.15.26358181 medRxiv

Top 6%

0.4%

Show abstract

Efficiently summarizing dietary records at scale remains a persistent bottleneck in nutritional epidemiology. We present FoodScribe, which translates free-text meal descriptions into quantitative nutrient profiles by combining ingredient parsing with nutrient retrieval by querying the USDA FoodData Central (FDC) database. Benchmarked using three LLM providers using Nutribench dataset, FoodScribe completed annotation of 3,807 meal descriptions in 2.5 hours, a task otherwise requiring substantial manual effort from trained nutritionists. FoodScribe achieved accuracy across macronutrient estimation (F1=0.79-0.89), with models performing better for protein than fat estimation. Application to a Mediterranean diet intervention cohort indicated dietary shifts consistent with the intervention pattern based on model-derived estimates. Integration with metabolomics data suggested that fiber and vegetable intake were positively associated with a fecal metabolite cluster.

8

Switching from febuxostat to dotinurad in patients with chronic kidney disease and hyperuricemia: a single-center, non-randomized study

Irifuku, T.; Kashiwado, S.; Masaki, T.

2026-07-18 nephrology 10.64898/2026.07.16.26358294 medRxiv

Top 7%

0.2%

Show abstract

Recently, an observational study demonstrated that a lower fractional excretion of uric acid (FEUA) is significantly associated with a higher risk of kidney failure. This study aimed to assess the efficacy of switching from febuxostat to dotinurad, which increases FEUA, in patients with chronic kidney disease (CKD) and hyperuricemia (HUA).This was a non-randomized, open-label, single-center, prospective, single-arm study involving 60 patients with CKD and HUA who received febuxostat. Participants first underwent a 3-month observation period, followed by a 3-month intervention period, during which treatment was switched from febuxostat to dotinurad. The primary outcomes were changes from baseline to 3-months after switching in the estimated glomerular filtration rate (eGFR) calculated from serum creatinine (eGFRcreat) and serum cystatin C (eGFRcys), as well as the serum uric acid levels. The secondary outcome was defined as the correlation between{Delta}FEUA and the changes in both eGFRcreat({Delta}eGFRcreat) and eGFRcys({Delta}eGFRcys), respectively. During the observation period, mean eGFRcreat decreased significantly. The baseline eGFRcreat (mL/min/1.73 m{superscript 2}) was 36.0 {+/-} 15.2, and the serum urate level (mg/dL) was 5.5 {+/-} 1.2. During the intervention period, eGFRcreat increased in contrast to the significant decline observed in eGFRcys. After 3 months of switching to dotinurad, the mean serum UA levels increased significantly from 5.5 {+/-} 1.2 to 6.1 {+/-} 1.4 mg/dL, despite a significant elevation in FEUA. Both {Delta}eGFRcreat and {Delta}eGFRcys after switching to dotinurad were positively correlated with {Delta}FEUA. Switching from febuxostat to dotinurad resulted in discrepant changes in eGFRcreat and eGFRcys, suggesting that renal function should be assessed carefully after switch. Additionally, the risk of elevated serum UA levels should be considered when switching from febuxostat to dotinurad in patients with CKD.

9

The Variance-Stabilizing Transformation for the Poisson Rate Ratio: Closed-Form Confidence Intervals

Ng, S.-P.

2026-07-18 epidemiology 10.64898/2026.07.16.26358255 medRxiv

Top 8%

0.2%

Show abstract

The incidence rate ratio R is the standard measure for comparing event rates in clinical trials and epidemiology. In vaccine trials, the vaccine efficacy is VE = 1 - R. When events are rare, the two arm counts are Poisson. The estimator of R is heteroskedastic: its sampling variance changes with the data. So no fixed-width interval covers correctly everywhere. The usual log-Wald interval is undefined at zero events and covers poorly at small counts. Early vaccine and drug-safety readouts fall in exactly this regime. We show that a single reparameterization collapses this bivariate problem to an effective one-parameter family with a quadratic variance function, whose variance-stabilizing transformation is 2 arcsinh(sqrt(R)). The reduction yields a closed-form confidence interval for R. Its two leading errors, a curvature bias and the variability of the estimated scale, each admit a closed-form correction with no tuning constants. In a Monte Carlo study of our seven arcsinh variants and five competitors, the +Curve+Stu variant covers within 0.002 of the nominal 0.95 for about 50 control and 5 treatment events. Its width is on par with the best competitor. It avoids the conservatism and zero-count breakdown of log-Wald and MOVER. For moderate counts, we recommend this interval; for sparser data, our Bar-Lev and Enis count-shift variant is more robust. The result is a ready-to-use, closed-form interval for the low-count regime. We illustrate it on early Covid-19 vaccine-efficacy readouts and provide reference implementations in R and Python.

10

Efficient stochastic epidemic simulation via the Sellke construction

van Boven, M.; Bootsma, M. C.

2026-07-17 epidemiology 10.64898/2026.07.16.26358219 medRxiv

Top 8%

0.1%

Show abstract

Stochastic epidemic models are a cornerstone of infectious disease epidemiology and are often used to study intervention scenarios. However, large run-to-run variability can make intervention effects difficult to estimate precisely. We revisit the epidemic Sellke construction, which assigns each individual an infection threshold for the cumulative infection hazard such that, conditional on the thresholds, the epidemic trajectory becomes deterministic. This enables coupling of simulations with and without an intervention, yielding low-variance effect estimates even when outcomes such as final size or peak incidence vary widely between runs. We develop an exact, event-driven implementation that maintains infection and recovery events in priority queues. Cumulative infection-hazard updates require O(log N) time per event, yielding overall complexity O(Elog N) for E events in a population of size N. The implementation achieves computational performance comparable to the classical Gillespie algorithm while naturally accommodating non-Markovian infectious periods and complex infectiousness profiles. We illustrate the approach using distance-dependent spread of avian influenza between poultry farms in the Netherlands and a multilayer population with households, schools, and workplaces. In both examples, coupling enables efficient within-run comparisons of intervention scenarios across stochastic realisations.

11

Malaria Pre-screening Technology Using Artificial Intelligence (AI)

Ibeto, O. O.; Nwoye, E. O.

2026-07-17 infectious diseases 10.64898/2026.07.15.26357432 medRxiv

Top 9%

0.1%

Show abstract

Malaria remains a severe health problem in endemic regions because people lack adequate diagnostic tools, leading to delayed medical care and elevated death rates. This research introduces a dual-mode artificial intelligence system that uses two complementary models to enhance malaria pre-screening and diagnosis. The patient-centered model uses multivariate logistic regression to analyze biosignals, including heart rate, body temperature, and oxygen saturation, collected through a wearable sensor prototype and a mobile interface for symptom analysis. The system enables patients to begin self-assessment to determine their level of need before scheduling a doctor's appointment. The clinician-centered model represents a customized convolutional neural network that uses annotated microscopy images of red blood cells to achieve 94.84% accuracy, 95.71% precision, 93.87% recall, 94.78% F1 score, and 0.84 Area Under Curve (AUC). The patient model achieved 94.6% accuracy and an AUC of 0.985 using a 70/30 train-test split. These systems work together to create a layered diagnostic system that can operate independently or together to detect malaria at an early stage, especially in areas with limited resources. The findings demonstrate that wearable biosignal data integration with image-based deep learning can produce dependable, scalable, and user-friendly systems for malaria pre-screening. Keywords - malaria diagnosis, artificial intelligence (AI), convolutional neural networks (CNN), wearable biosensors, multivariate logistic regression

12

Statistical Inference and Power Analysis for Comparative F1 and Fβ Scores under Correlated Classifier Pairs

Hsu, C.-Y.; Liu, Q.; Shyr, Y.

2026-07-17 dermatology 10.64898/2026.07.15.26358166 medRxiv

Top 9%

0.1%

Show abstract

As machine learning and artificial intelligence systems are increasingly used in healthcare, rigorous evaluation of their classification performance has become critical. The F1 and F{beta} scores are widely adopted metrics for assessing performance in imbalanced biomedical data. Recently, we introduced psF1, a unified statistical framework for inference and study design for single and comparative F1 and F{beta} scores under the assumption of independent classifiers. In practice, however, benchmarking two classifiers on the same dataset creates a correlated paired setting. Ignoring this intrinsic dependency leads to overestimation of the standard error and a substantial loss of statistical power. To address this, we develop psF1pair, an advanced framework for statistical inference and power analysis that explicitly accounts for correlations between classifier pairs. Extensive simulation studies demonstrate the performance of psF1pair, and its utility is further illustrated through application to a real-world imaging classification system. As expected, higher correlation between classifiers yields narrower confidence intervals and enhanced statistical power. A freely available R package is provided to facilitate implementation, supporting accurate evaluation and study design for predictive and classification models in biomedical research.

13

Analytical perturbation reveals hidden instability of biological phenotypes

Piorkowska, N. J.; Ostromecki, A.; Franik, G.; Bizon, A.

2026-07-16 endocrinology 10.64898/2026.07.13.26357916 medRxiv

Top 9%

0.1%

Show abstract

Background Unsupervised machine learning has become a cornerstone of computational phenotyping across clinical medicine, genomics, imaging, and multi-omics research. However, phenotype discovery relies on a sequence of analytical decisions - including missing-data handling, preprocessing, dimensionality reduction, clustering methodology, and stochastic initialization - that are rarely evaluated collectively. Although clustering stability has been extensively investigated, the robustness of complete analytical workflows remains largely unexplored. Results We developed an Analytical Perturbation Framework that systematically quantifies the robustness of phenotype discovery by perturbing complete unsupervised learning workflows rather than individual clustering algorithms. Using a real-world cohort of 1,286 women with polycystic ovary syndrome (PCOS), we generated 116 valid analytical pipelines comprising alternative preprocessing strategies, missing-data handling methods, dimensionality reduction approaches, clustering algorithms, and random initializations. Agreement between independently generated phenotype solutions was consistently low (median Adjusted Rand Index = 0.079), indicating substantial sensitivity of phenotype discovery to routine analytical decisions. Variance decomposition identified preprocessing as the largest contributor to phenotype instability (22.8%), followed by clustering methodology (14.6%), whereas stochastic initialization explained only 3.1% of the observed variability. At the patient level, most individuals exhibited reproducible phenotype assignments (median Patient Robustness Score = 0.719), although a substantial subgroup showed markedly lower assignment stability. Feature perturbation analyses identified follicle-stimulating hormone, anti-thyroglobulin antibodies, anti-thyroid peroxidase antibodies, total testosterone, luteinizing hormone, and androstenedione as the strongest contributors to computational robustness, rather than biological importance. Finally, phenotype solutions demonstrating greater computational robustness also exhibited greater biological coherence during independent validation.

14

Analytical Performance and 99th Percentile Upper Reference Limit of the Novel SPINCHIP High-Sensitivity Cardiac Troponin I Point-of-Care Assay

MacKenzie, J.; Aakre, K. M.; Paus, D.; Broughton, M. N.; Storvold, G. L.; Olberg, A.; Stenmark, S.; Booij, B. B.; Scott, S.; Michel-Busseret, S.; Octave, L.; Tveit, A.; Lyngbakken, M. N.; Nilsson, J.; Rosjo, H.

2026-07-20 emergency medicine 10.64898/2026.07.17.26357157 medRxiv

Top 10%

0.1%

Show abstract

BACKGROUND In line with International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) recommendations for high-sensitivity cardiac troponin assays, analytical validation and reference limit assessments are required to confirm that an assay meets performance criteria. This study evaluated the analytical performance and established the 99th percentile upper reference limit (URL) for the SPINCHIP High-Sensitivity Cardiac Troponin I (SPINCHIP hs-cTnI) point-of-care assay. METHODS Analytical performance characteristics, including the limit of blank (LoB), limit of detection (LoD), and limit of quantification (LoQ), were assessed. Additionally, 1,053 plasma samples and 1,055 whole-blood samples were used to determine the URL. Imprecision around the 99th percentile URL was evaluated as part of the analytical validation. High-sensitivity criteria were assessed by confirming measurable cTnI in [≥]50% of healthy individuals (n=432 plasma; n=431 whole blood) and achieving imprecision <10% at the 99th percentile (plasma, n=960; whole blood, n=480). RESULTS SPINCHIP hs-cTnI demonstrated a LoB of 0.3 ng/L; LoDs of 0.8 ng/L (plasma) and 0.9 ng/L (whole blood); and LoQs of 1.1 ng/L (plasma) and 1.4 ng/L (whole blood). The analytical measuring range was 1.1-9,000 ng/L. Imprecision at the common 99th percentile URL (14 ng/L) was 5.8%; for men (URL=16 ng/L) 5.6% and for women (URL=10 ng/L) 6.3%. Greater than 85.2% (94.0% and 76.1% in men and women, respectively) of healthy individuals showed measurable cTnI above the LoD. CONCLUSIONS The SPINCHIP hs-cTnI assay meets the IFCC high-sensitivity requirements, demonstrating <10% imprecision at the 99th percentile, reliable low-concentration precision and cTnI detection in more than half of healthy individuals.

15

How Do Nurses Make Clinical Decisions Via Remote Reviews: A Convergent Mixed-Methods Study

Zhang, Y.; Sutherland, S.; GREENWAY, K.; Stayt, L.

2026-07-17 nursing 10.64898/2026.07.15.26357946 medRxiv

Top 10%

0.1%

Show abstract

Abstract Background: Remote clinical reviews have become an integral component of contemporary nursing practice across community and acute care settings. Nurses increasingly make autonomous clinical decisions using telephone, video, and online/digital systems, often with limited sensory information and under conditions of uncertainty. However, empirical understanding of how nurses make clinical decisions via remote reviews remains limited. Aim: To explore and understand how registered nurses (RNs) make clinical decisions about patient care via remote reviews. Methods: A convergent mixed-methods design was employed. Quantitative data (analytic quantitative sample N=53) were collected using validated questionnaires that measured decision-making processes, physician-nurse collaboration, decision-making stress, and perceived decision-making ability. Qualitative data (N=23) were generated through semi-structured interviews. Data collection took place between October 2024 and April 2025. Quantitative data were analysed using descriptive statistics, correlation, and multiple regression. Qualitative data were analysed using framework analysis. Integration was achieved through pillar-building and theory-driven synthesis and illustrated by joint display tables. Results: Most nurses demonstrated a flexible decision-making style, integrating analytical and intuitive reasoning. Both analytical and intuitive processes were positively associated with perceived decision-making ability. Physician-nurse collaboration emerged as a strong predictor of decision-making confidence, while decision-related stress was not a significant predictor. Qualitative findings identified three themes: characteristics of remote review; making adaptive decisions shaped by both internal and external constraints and enablers; and external influencing factors. The integrated findings informed a theory-informed ICE framework to illustrate how nurses make clinical decisions via remote reviews. Conclusion: Remote clinical decision-making is a dynamic cognitive-environmental process rather than a purely individual cognitive act. The ICE framework conceptualises this interaction, extending existing decision-making theories to digitally mediated care. Impact: Understanding remote decision-making supports training design, clinical governance, and the development of Artificial Intelligence-enhanced decision-support tools grounded in ecological bounded rationality. Patient or Public Contribution: Patient and public representatives contributed to stakeholder discussions that informed the development of the interview topic guide and the theoretical model. Patients or members of the public were not involved in recruitment, data collection, analysis, interpretation of findings, or preparation of the manuscript. Keywords: clinical decision-making, remote reviews, telehealth, nursing, mixed methods, ecological bounded rationality

16

Association between stage-specific sleep bout durations and obstructive sleep apnea severity: A variable-domain functional regression approach

Rahman, M. M.; Guha Niyogi, P.

2026-07-16 epidemiology 10.64898/2026.07.14.26358060 medRxiv

Top 10%

0.1%

Show abstract

The apnea-hypopnea index (AHI), the conventional metric of obstructive sleep apnea (OSA) severity, is typically studied using scalar summaries of sleep architecture, such as the total time spent in each sleep stage. Although clinically interpretable, these summaries fail to capture the temporal organization of overnight sleep-stage sequences and may obscure stage-specific associations with OSA severity. Modeling the complete sleep-stage trajectory provides substantially richer temporal information; however, because total sleep duration varies across individuals, sleep-stage trajectories are observed over subject-specific domains, limiting the applicability of conventional functional regression methods that assume a common observation interval. We therefore applied Variable-Domain Functional Regression (VDFR) to overnight polysomnographic data from the APPLES study (n= 1,103), treating the epoch-by-epoch sleep-stage sequence as a continuous, variable-length functional predictor of AHI. We compared three levels of sleep-stage granularity: five stages (Wakefulness, N1, N2, N3, REM), three stages (Wakefulness, Non-REM, REM), and binary staging (Wakefulness vs. Sleep). Functional sleep-stage terms were significant across all staging granularities and model structures (all p-values [≤]0.001). Wake, N1, and N2 were positively associated with AHI, whereas N3 and REM were negatively associated, with REM exhibiting the strongest association. These effects were attenuated under coarser staging representations, highlighting the importance of preserving fine-grained sleep architecture. To our knowledge, this is the first application of VDFR to overnight polysomnographic data in OSA, showing that accommodating subject-specific sleep durations enables the identification of stage-specific temporal associations with AHI severity that are attenuated or obscured by coarser staging and conventional scalar analyses.

17

A ReAct Agentic AI System for Natural Language Querying and Statistical Analysis of The Cancer Genome Atlas Clinical Data

Korutla, R.; Amal, S.

2026-07-17 health informatics 10.64898/2026.07.15.26358188 medRxiv

Top 10%

0.1%

Show abstract

The Cancer Genome Atlas (TCGA) holds clinical data for over 11,000 patients across 33 cancer types, but access is hard because of complex file structures, heterogeneous formats, and the need for programming. We present an agentic system for natural language querying and statistical analysis of TCGA clinical data. The system uses a large language model as an autonomous ReAct agent that selects from eight computational tools, including data extraction, descriptive statistics, Kaplan-Meier survival analysis with log-rank tests, hypothesis testing, and verification against the curated TCGA Pan-Cancer Clinical Data Resource (CDR). The agent reasons about intermediate results, adapts its approach, and returns clinically contextualized responses with source attribution and auditable traces. We introduce TCGA-Agent-Bench, 440 queries across five difficulty tiers with ground truth from the independently curated TCGA-CDR, evaluated with dual metrics of numerical accuracy and clinical completeness. The system achieves 93.4% overall accuracy (100% single-patient lookups, 99.1% cohort statistics, 92.8% comparative analyses), outperforming a fixed rule-based pipeline (87.1%), a single-pass LLM (81.8%), and retrieval-augmented generation (66.9% on a subset). Most of the benchmark is answerable from the CDR alone, so we locate the extraction layer's value in fields the CDR lacks (drug treatments, TNM components, biomarkers, biospecimen metadata): on 26 queries targeting these, the full system answers 100% versus 3.8% for CDR-only. Ablations show the reasoning loop is most impactful (+9.1% accuracy, +22.0 completeness points). A tool-based agentic architecture enables accurate, auditable analysis of clinical repositories, with value driven by tool design and recovered fields rather than model scale.

18

Bayesian shared-component spatiotemporal modeling of sexually transmitted infection co-occurrence: identifying geographic vulnerability across 204 countries, 1990-2023

Ma, Q.; Zhang, T.; Lin, D.; Zou, W.

2026-07-21 epidemiology 10.64898/2026.07.19.26358422 medRxiv

Top 10%

0.1%

Show abstract

Objectives: Although HIV incidence has declined in some settings, the overall global burden of sexually transmitted infections remains a major public health concern. In the context of the World Health Organization's call for people-centred STI prevention and care, identifying the shared geographic pattern of multiple STIs using data-driven analysis may help detect vulnerable areas and inform integrated prevention strategies. Methods: We analysed country-level incidence counts from the Global Burden of Disease 2023 study for 204 countries and territories over 1990-2023. A Bayesian shared-component spatiotemporal model was fitted, decomposing each disease's log-rate into a shared spatial component (scaled intrinsic conditional autoregressive prior), disease-specific spatial deviations, disease-specific first-order random walk temporal effects, and five socioeconomic covariates, with a negative binomial likelihood to accommodate overdispersion. The shared spatial score - the posterior mean of the shared spatial component - was used as a continuous index of STI co-occurrence burden. Posterior exceedance probabilities quantified directional stability. External validity was assessed via Spearman correlation with the Socio-demographic Index and generalised estimating equation regression of HIV/AIDS mortality on the shared score. Results: The shared spatial score exhibited marked geographic heterogeneity. The five highest-scoring countries were Eswatini (2.25), Lesotho (2.13), Malawi (1.90), Mozambique (1.89), and South Africa (1.85), all in southern Africa. Fifty-seven countries had high directional stability (posterior exceedance probability >0.95), concentrated in sub-Saharan Africa and the Caribbean. The score correlated negatively with SDI (Spearman rho = -0.619, p = 6.4 x 10^-23) and positively with HIV/AIDS mortality (incidence rate ratio = 14.64 per standard deviation, 95% CI: 11.90-18.01). Prior sensitivity analysis confirmed near-perfect ranking stability (rho >= 0.9999). Conclusions: STI co-occurrence is geographically concentrated, with the highest shared burden in sub-Saharan Africa and persistently elevated shared spatial signals also observed in parts of mainland Southeast Asia and the Caribbean. The shared spatial score provides a data-driven tool for prioritising integrated STI screening and prevention resources across countries.

19

Implementing the National Alzheimer's Coordinating Center Uniform Data Set (v3) within the Diabetes Prevention Program Outcomes Study

Doherty, L.; Dechiario, I.; Sherif, H.; Bowers, A.; Martinez, D.; Sanchez, D. L.; Febres, G. J.; Carmichael, O.; Shah, V.; Nadkarni, N. K.; Goldberg, T. E.; Noble, J. M.; Luchsinger, J. A.; Temprosa, M.; Research Group, D.

2026-07-21 epidemiology 10.64898/2026.07.17.26357765 medRxiv

Top 10%

0.1%

Show abstract

INTRODUCTION: The Diabetes Prevention Program (DPP) was a randomized clinical trial designed to prevent type 2 diabetes (T2D) in adults with prediabetes. The DPP Outcomes Study (DPPOS) is the 30-year follow-up of this cohort, focusing on T2D, prediabetes, and related complications. Cognitive assessments began in 2009 and expanded in 2022 to examine cognitive impairment, including Alzheimer's disease (AD) and AD related dementias (ADRD), in the surviving cohort. To support these aims, the National Alzheimer's Coordinating Center Uniform Data Set version 3 (NACC-UDSv3), the standardized framework used by Alzheimer's Disease Research Centers, was implemented in DPPOS in 2022 to enable data sharing with NACC. These forms were complemented by cognitive tests administered in DPPOS. We aimed to integrate the NACC-UDSv3 into the existing longitudinal DPPOS framework while maintaining fidelity to its structure and developing automated reports to streamline cognitive outcomes adjudication. METHODS: Items from the 16 NACC-UDSv3 data forms were compared with those already collected within DPPOS to integrate overlapping similar items, add missing NACC-UDSv3 items, and create a dataset harmonized with NACC-UDSv3. Forms were adapted for electronic data capture (EDC) using the MIDAS (Multimodal Integrated Data Acquisition System, George Washington University). Automated reports integrated current and prior neuropsychological scores to support adjudications. In the first wave of the DPPOS-AD/ADRD study, 1561 cognitive adjudications were successfully completed using the harmonized DPPOS and NACC-UDSv3 data implemented into MIDAS. DISCUSSION: The DPPOS-AD/ADRD project demonstrated that NACC-UDSv3 can be successfully integrated into a long-standing longitudinal cohort not originally designed for AD/ADRD research. The harmonization, electronic capture, and automated adjudication processes may provide a practical framework for other cohorts seeking to incorporate NACC-UDSv3 to align with national AD/ADRD research standards.

20

An ancestry-matched Mendelian randomisation analysis of kidney function and heart failure subtypes in African ancestry populations

Gaye, N. D.; Diawara, A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358145 medRxiv

Top 10%

0.1%

Show abstract

Chronic kidney disease and heart failure disproportionately burden populations of African ancestry, yet Mendelian randomisation (MR) studies of the causal relationship between kidney function and heart failure subtypes have been conducted exclusively in European ancestry populations. We performed a forward two-sample MR analysis to evaluate the causal effect of genetically predicted estimated glomerular filtration rate (eGFR) on heart failure with preserved ejection fraction (HFpEF) and heart failure with reduced ejection fraction (HFrEF) in individuals of African ancestry. Genetic instruments were selected from an African ancestry eGFR genome-wide association study (N = 67,943) at genome-wide significance, with linkage disequilibrium clumping using an African ancestry reference panel. Heart failure subtype summary statistics were obtained from the Million Veteran Program (HFpEF: 5,379 cases / 113,041 controls; HFrEF: 9,104 cases / 109,632 controls). Six independent SNPs (F-statistics 30.5 &#8211 107.3; R&#178 = 0.62%) were retained as instruments. The primary inverse-variance weighted analysis provided no evidence of a causal effect of eGFR on HFpEF (OR 0.92, 95% CI 0.80 &#8211 1.06, p = 0.248) or HFrEF (OR 0.98, 95% CI 0.78 &#8211 1.23, p = 0.878). Sensitivity analyses were directionally consistent. There was no evidence of heterogeneity or directional pleiotropy. Minimum detectable effects at 80% power were OR 1.28 for HFpEF and OR 1.22 for HFrEF. These null findings should be interpreted as inconclusive given current power constraints; larger ancestry-matched studies are needed.